2nd Deep Machine Translation Workshop Program Committee Adding Syntactic Structure to Bilingual Terminology for Improved Domain Adaptation . . . . . . 39
نویسندگان
چکیده
Moses is a well-known representative of the phrase-based statistical machine translation systems family, which are known to be extremely poor in explicit linguistic knowledge, operating on flat language representations, consisting only of tokens and phrases. Treex, on the other hand, is a highly linguistically motivated NLP toolkit, operating on several layers of language representation, rich in linguistic annotations. Its main application is TectoMT, a hybrid machine translation system with deep syntax transfer. We review a large number of machine translation systems that have been built over the past years by combining Moses and Treex/TectoMT in various ways.
منابع مشابه
Adding syntactic structure to bilingual terminology for improved domain adaptation
Deep-syntax approaches to machine translation have emerged as an alternative to phrase-based statistical systems. TectoMT is an open source framework for transfer-based MT which works at the deep tectogrammatical level and combines linguistic knowledge and statistical techniques. When adapting to a domain, terminological resources improve results with simple techniques, e.g. force-translating d...
متن کاملKyotoEBMT System Description for the 2nd Workshop on Asian Translation
This paper introduces the KyotoEBMT example-based machine translation framework. Since last year’s workshop we have replaced input trees with forests, improved alignment, added new features, and introduced bilingual neural network reranking. The major benefits of our system include online example retrieval and flexible reordering. We also use syntactic dependency analysis for both source and ta...
متن کاملUse of Domain-Specific Language Resources in Machine Translation
In this paper, we address the problem of Machine Translation (MT) for a specialised domain in a language pair for which only a very small domain-specific parallel corpus is available. We conduct a series of experiments using a purely phrase-based SMT (PBSMT) system and a hybrid MT system (TectoMT), testing three different strategies to overcome the problem of the small amount of in-domain train...
متن کاملDomain Adaptation for Medical Text Translation using Web Resources
This paper describes adapting statistical machine translation (SMT) systems to medical domain using in-domain and general-domain data as well as webcrawled in-domain resources. In order to complement the limited in-domain corpora, we apply domain focused webcrawling approaches to acquire indomain monolingual data and bilingual lexicon from the Internet. The collected data is used for adapting t...
متن کاملEnhancing Machine Translation of Academic Course Catalogues with Terminological Resources
This paper describes an approach to translating course unit descriptions from Italian and German into English, using a phrase-based machine translation (MT) system. The genre is very prominent among those requiring translation by universities in European countries in which English is a non-native language. For each language combination, an in-domain bilingual corpus including course unit and de...
متن کامل